Translationese: Between Human and Machine Translation

نویسنده

  • Shuly Winter
چکیده

Translated texts, in any language, have unique characteristics that set them apart from texts originally written in the same language. Translation Studies is a research field that focuses on investigating these characteristics. Until recently, research in machine translation (MT) has been entirely divorced from translation studies. The main goal of this tutorial is to introduce some of the findings of translation studies to researchers interested mainly in machine translation, and to demonstrate that awareness to these findings can result in better, more accurate MT systems. First, we will survey some theoretical hypotheses of translation studies. Focusing on the unique properties of translationese (the sub-language of translated texts), we will distinguish between properties resulting from interference from the source language (the so-called “fingerprints” of the source language on the translation product) and properties that are source-language-independent, and that are presumably universal. The latter include phenomena resulting from three main processes: simplification, standardization and explicitation. All these phenomena will be defined, explained and exemplified. Then, we will describe several works that use standard (supervised and unsupervised) text classification techniques to distinguish between translations and originals, in several languages. We will focus on the features that best separate between the two classes, and how these features corroborate some (but not all) of the hypotheses set forth by translation studies scholars. Next, we will discuss several computational works that show that awareness to translationese can improve machine translation. Specifically, we will show that language models compiled from translated texts are more fitting to the reference sets than language models compiled from originals. We will also show that translation models compiled from texts that were (manually) translated from the source to the target are much better than translation models compiled from texts that were translated in the reverse direction. We will briefly discuss how translation models can be adapted to better reflect the properties of translationese. Finally, we will touch upon some related issues and current research directions. For example, we will discuss recent work that addresses the identification of the source language from which target language texts were translated. We will show that native language identification (in particular, of language learners) is a closely related task to the identification of translationese. Time permitting, we will also discuss work aimed at distinguishing between native and (advanced, fluent) non-native speakers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information Density and Quality Estimation Features as Translationese Indicators for Human Translation Classification

This paper introduces information density and machine translation quality estimation inspired features to automatically detect and classify human translated texts. We investigate two settings: discriminating between translations and comparable originally authored texts, and distinguishing two levels of translation professionalism. Our framework is based on delexicalised sentence-level dense fea...

متن کامل

A Parallel Corpus of Translationese

We describe a set of bilingual English–French and English–German parallel corpora in which the direction of translation is accurately and reliably annotated. The corpora are diverse, consisting of parliamentary proceedings, literary works, transcriptions of TED talks and political commentary. They will be instrumental for research of translationese and its applications to (human and machine) tr...

متن کامل

Adapting Translation Models to Translationese Improves SMT

Translation models used for statistical machine translation are compiled from parallel corpora; such corpora are manually translated, but the direction of translation is usually unknown, and is consequently ignored. However, much research in Translation Studies indicates that the direction of translation matters, as translated language (translationese) has many unique properties. Specifically, ...

متن کامل

Improving Statistical Machine Translation by Adapting Translation Models to Translationese

Translation models used for statistical machine translation are compiled from parallel corpora that are manually translated. The common assumption is that parallel texts are symmetrical: The direction of translation is deemed irrelevant and is consequently ignored. Much research in Translation Studies indicates that the direction of translation matters, however, as translated language (translat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016